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Abstract We study the quantum versión of a decisión tree classifier to fifi the gap 
between quantum computation and machine leaming. The quantum entropy impurity 
criterion which is used to determine which node should be split is presented in the 
paper. By using the quantum fidelity measure between two quantum States, we cluster 
the training data into súbelasses so that the quantum decisión tree can manipúlate 
quantum States. We also propose algorithms constructing the quantum decisión tree 
and searching for a target class over the tree for a new quantum object. 

Keywords Quantum information processing • Quantum entropy • Quantum decisión 
tree • Quantum classification • Machine learning 


1 Introduction 

Machine learning (ML) and pattern recognition aim to generate classifying expres- 
sions simple enough to be understood easily by a human. The problem of search¬ 
ing for patterns in data is a fundamental one and has a long and successful his- 
tory [1,2]. Most research in pattern recognition is about methods for supervised 
learning or unsupervised learning [3]. Classic decisión tree classification belongs 
to supervised learning methods. On the other hand, quantum information process- 
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ing (QIP) has been achieving much progress in recent years [4]. Quantum Infor¬ 
mation is a natural generalization of classical information. It is based on quan¬ 
tum mechanics, the most accurate and complete description of the world. How- 
ever, it is quite different from its classical counterpart since the quantum versión of 
classical algorithms presents different characteristics. This paper concems quantum 
classification. 

In classical machine leaming, statistical classification is a supervised learning pro- 
cedure in which individual objects are assigned into groups based on quantitative infor¬ 
mation on one or more characteristics inherent in the objects and based on a training set 
of previously labeled objects. Top-down induction of a decisión tree is a powerful and 
simple method of pattern classification. Formally, the problem can be stated as follows: 
given a set of classes containing m valúes: C = {c\, C 2 , ... c m }, a set of training data 
containing n objects is described as {fe, y\), fe, y2), • • •, fe, y¿),..., ( x n , y n )}, 
where is a vector of d attributes and e C is the class label correspond- 
ing to the object x¿. The attributes set of the input objects can be denoted by 
A = í? 2 , cii,, a¿¡}. For each attribute e A, its domain valúes set is 

described by V a¿ = fe -4 , 2 , .. •, }, where is the cardinality of V üi . The goal 

of classification is to develop an optimal classification rule that can determine the class 
of any object from its valúes of the attributes. According to the classifier, we can find 
the class y e C for a new object v. 

Decisión tree classifiers are used successfully in many diverse areas such as data 
mining, radar signal classification, character recognition, remóte sensing, medical 
diagnosis, expert systems, and speech recognition. A classical decisión tree classifier 
is a leaming method for approximating discrete-valued target functions. Decisión trees 
can also be re-represented as sets of if-then rules easily [5]. The problem of decisión 
tree classification can be decomposed into two subproblems: (1) generate an optimal 
decisión tree classifier with minimum generalization error and (2) determine the class 
of the unseen objects with as high of an accuracy as possible. Since the second problem 
is trivial for classical decisión tree classifiers, most researchers focused on the first 
problem. The key task of the first problem is the selection of a node splitting míe. The 
most common splitting criteria ineludes information gain, GINI Índex, chi-squared 
statistic, and distance measure. 

QIP promises to perform parallel information processing and rapid search over the 
data potentially yielding significant advanees to the whole field of Computer Science. 
It will be very exciting to combine quantum computation with machine leaming or 
artificial intelligence (AI). The area of quantum machine learning, which investigates 
its classical counterpart in quantum systems, has been of recent interest [6-11]. Sev- 
eral papers have already reported how QIP could be used to construct a quantum 
classifier. Ezhov [12] and Ventura [13] represented each of the labeled objects in the 
training data as a basis State, say |fe, in a superposition with some coefficient. They 
take the basis State \\[r) as an initial State and itérate it using Grover’s algorithm [14]. 
The classification result was obtained by measuring the system after an appropriate 
number of iterations. Reference [13] had a poor ability at generalization due to the 
uncertainty in the number of iterations required. The classification pattern was limited 
to binary classification in [12] and [13]. Schützhold [15] presented a quantum pat¬ 
tern recognition algorithm which was more abstract than statistical classification. In 
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Ref. [16], an algorithm for quantum data classification was developed which referred 
to quantum State distinction or detection rather than pattern classification. Further- 
more, it only coped with labeled Ítems. In Ref. [17], the problem of the classifica¬ 
tion of two arbitrary unknown mixed qubit States is researched. To the best of our 
knowledge, however, not much progress has been made in quantum classification up 
to now. 

The aim of this paper is to consider the quantum versión of a decisión tree clas¬ 
sifier, which can also deal with multiclass situations. The remainder of the paper 
is organized as follows: Sect. 2 devises a quantum decisión tree classifier. There 
von Neumann entropy is used for splitting the object space. To discretize the quan¬ 
tum attribute States, a quantum clustering method is proposed. Section 3 presents 
the algorithm for searching over the quantum decisión tree. According to the quan¬ 
tum fidelity metric, the unseen quantum data can be classified to a certain class. 
This is also useful for predicting quantum States. A brief conclusión is drawn 
in Sect. 4. 

2 Quantum decisión tree classifier model 

2.1 Problem statement 

A decisión tree classifier learns from a training dataset which contains observations 
about objets, which are either obtained empirically or acquired from experts. In a quan¬ 
tum world, the training data consist of quantum objects instead of classical observations 
on classical data. 

A quantum training dataset with n quantum data pairs can be described as 
D = {(|*i), |yi)), (\x 2 ), \y 2 )), • • •, (I x n ), ly*»}, where | x t ) is the ith quantum 
object of the training dataset and |y¿) is the known class State corresponding 
to the quantum State \x¡). We cali the set of all example quantum States X = 
{|jci), 1*2), ..., \xí ),..., \x n )} the sample set , and the set of all quantum class States 
Y = {| yi), | y 2 ),..., | y i ),..., | y n )} is called the class sample set. We also cali a class 
State | y i) the target attribute State. 

A quantum State, |v¿), is represented by a ¿/-dimensional attribute vector (or 
attribute State), |jt¿) = (|vp¿), \x 2 j), ..., \xd,i)) 9 depicting d measurements made 
on the tupie from d attributes, respectively, a\, a 2 ,..., For attribute a¡, where 
i = 1, 2,..., d, its domain valué set V üi is described as {|Vj 9 i), \ví¿), ..., |u¿ ?m/ )}, 
where \ v¡j) is the jth basis State and stands for its cardinality. These basis States 
span a Hilbert space S¡. Any quantum State 10) belongs to the space Si and can be 
described by a superposition of the basis States: 



( 1 ) 


The coefficients o¿íj may be complex with ^ \c¿íj\ 2 = 1. The set of all possible 
input objects is called the instance space , which is defined as a tensor product of all 
input attributes’ quantum systems: S = Si ® ¿>2 ® • • • 0 
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We assume that C& = {|q?i), \ c b2), • • • \cbm)} is the set of m basis States which 
describe the class State. | cu) is called class basis State , where i = 1,2 , ,m. These 

class basis States which span a Hilbert space S c (called class space). A class State \yj) 
can be described by a superposition of these class basis States : | yj) = X/Li a i I c bi ), 
where JT |a/| 2 = 1. Provided the universal set of distinct class States is described 
as C = {|ci), \c 2 ),..., | Ci ),..., \cm)}, where |c¿) g S c . For arbitrary class State 
|c¿) g C and | cj) e C, wehave |c¿) /= |c/) if i / j. 

Obviously, the sample set belongs to the instance space , i.e., X C S. We also have 
7 C S c . In this paper, we restrict arbitrary attribute State |jc¿j) and class State |y&) to 
puré States. 

Given a training dataset Z) with attribute set A = {¿q, « 2 , ..., ..., for 

each attribute < 3 / e A, we denote the set of its attribute States by where = 

{|x; 1 ), |jc¿ 2 ), • • •, \xí «)}. And then, the training dataset D can be rewritten as: D = 

A}. 

The goal is to form a decisión tree classifier that can be used to predict a previously 
unseen object by explicitly assigning it to a specific class State. More accurately, using 
the training quantum States and the corresponding class States, a decisión tree classifier 
t is designed. Provided we are given a finite copies of a new puré State \x new ) e S. 
By searching over the quantum decisión tree t , we can obtain a precise class State 
| y new) e C corresponding to the quantum State \x new ). 

In a quantum world, leaming is more difficult for a classification algorithm than in 
a classical world. The reason is that the quantum mechanics forbids us to obtain two 
or more identical copies of unknown quantum State. In this paper, the constraint can 
be relaxed by considering the case of múltiple copies of the State either to be learn 
or to be classified (see the quantum témplate matching problem of [18]). So, in the 
remainder of the paper, both the training State | Xij) and the State to be classified \x new ) 
have múltiple copies. 


2.2 Quantum decisión trees 

Sometimes, the term quantum decisión tree is used while actually referring to a 
quantum query algorithm or quantum black-box algorithm (for instance, Grover’s 
algorithm [14]), which calculates the function / : {0, l}' 2 —> {0, 1} with the 
help of quantum superpositions [19,20]. Such quantum algorithms are in fact not 
trees. In this context, our quantum decisión tree is a real tree like the classical 
versión. 

In classical trees, attributes may be either discrete, having valúes drawn from a 
known set of possible valúes, or they may be continuous with valúes that are real 
numbers. The outcomes of the test nodes can simply be the valúes for discrete attributes 
or the intervals for continuous attributes in the classical setting. However, the above- 
mentioned method is invalid for quantum decisión trees since the valúes of an attribute 
can be in superposition States which result in too much distinct data. Overtrained 
classifier and larger tree would be generated if we split the training dataset according 
to these distinct discrete valúes simply. For each attribute, in this paper, we partition 
or cluster the attribute States into several subclasses according to the fidelity between 
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quantum States. For each subclass of an attribute, we calcúlate its centroid which can 
represent the subclass. 

Similar to the classical decisión tree, a quantum counterpart R consists of nodes 
that from a rooted tree, meaning it is a directed tree with a node called root that has 
no incoming edges. All other nodes have exactly one incoming edge. A node with 
outgoing edges is called an internal or test node which represents an attribute e A. 
All other nodes are called leaves or decisión nodes which belong to C (the set of 
class States). In a quantum decisión tree, each internal node splits the training dataset 
into two or more subsets according to a certain discrete function of the input attribute 
States. In this paper, each test considers a single attribute, such that the training dataset 
is partitioned according to the subclasses of the attribute. Each leaf is assigned to one 
class representing the most appropriate target attribute State. A quantum decisión tree 
classifies objects by sorting them down the tree from the root to some leaf node, which 
provides the classification of the object. An object is classified by starting at the root 
node of the tree, testing the attribute specified by this node, and then moving down 
the tree branch corresponding to the subclass of the attribute in the given object. This 
decisión is then repeated for the subtree rooted at the new node. 

Given a quantum decisión tree R deriving from the training dataset D = 
te», (1*2), te», ..., (te), te»} = Each node 

in R is also a tree. Any node t in R is described as t = [D^\ ai, {t c \, í c 2 , ..., t cti }}, 
where is the set of training data in the node t, ai e A , and {t c i, t c 2 , • • •, *<*,■} is 
the set of its ti subnodes. The tree t is split into ti subtrees according to the attribute 
at. Let t.attribute denote the attribute of the node t , then t.attribute = For the 
root R , = D. 

For a node t, = {D^\ D%\ ..., D¿ \ F^}. Suppose t.attribute = we 
divide into ti clusters: Dp] , Dpi , ..., Dp] , where Dp] c . Each cluster Dp] 
has a centroid denoted by | xcf'j), then the set of centroids for attribute at node t is 
described by XC-^ = {|xc?}), \xcfl), ..., \xcf^)}. 

Then, for each item D^p g D^\ it is divided into clusters: Dp \, ..., D^\ , 

where j 7 ^ i. The partitioning method is described as below. If a State | xí¿) e is 
divided into the set Dpj , then we assign the State \xj¿) e D^p into the cluster dJ], 
where 1 < / < ti . The training dataset then is partitioned into ti subsets according 
to the set of centroids, XCP . That is to say, the node t is split into t¿ descendant node: 
¡el, t c 2, • • •, t c j ,... , t c ti • I n meantime, the set of target attribute States, Y^\ is also 
partitioned into ti subsets described by = { Y { p , Yp\ ..., Y¡p}, where t and i 
mean that the class States is partitioned by attribute at node t. For a descendant node 
t cj, we generate an edge with a label | xcf\) by linking node t to node tcj- This process 
is then repeated for each subtree of t. 

To construct a quantum decisión tree, we need to (1) decide which attribute to 
test at each node in the tree. We would like to select the attribute that is most useful 
for classifying objects. We discuss the detail of node splitting criterion in Sect. 2.3. 
(2) clustering the attribute States of expected attribute into appropriate clusters. The 
problem is discussed in Sect. 2.4. 
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2.3 Node splitting criterion 

Much of research in designing decisión trees focuses on assigning which attribute test 
should be performed at each node. The fundamental principie underlying tree creation 
is that of simplicity: We prefer decisions that lead to a simple, compact tree with few 
nodes. According to the principie of Occam’s razor, the simplest model that explains 
data is the one to be preferred. To this end, we look for an attribute test at each node that 
makes the subsidiary decisión trees to be as simple as possible. In classical algorithms, 
the most popular criterion is the entropy impurity : Entropy (t) = ~Pil°82Pi, 

where pi is the proportion of node t belonging to class i, m t is the total number of 
classes of node t. 

Classical node splitting criteria are not working in quantum world since the classes 
can be in superposition States. We present a new criterion quantum entropy impurity 
to measure the attributes. Given a quantum decisión tree or subtree t and the set of 
class States = {|yj^), |y^), • • •, Iy«^)} which belongs to t , where c Y , 
and n t is the cardinality of Y^\ Let Y^ = {|yj d ^), \y^), ..., |yj^f)} denote the 
set of distinct class States of Y^\ where n& t is the cardinality of then c 
Y^\ y( dí ) c C, &ndn t > n& t . For any class States |y} d ^) e Y^ and ly^) e Y^ át \ 

we have |yí d ^) / | yf^) if i ^ j • The quantum entropy impurity of node t can be 
defined by the following: 

S(p) = -tr(plogp) (2) 

where p = 2^=1 P^ |y^ d ^) (y/ d ^ I is the average State of or density operator of 
Y^\ where p\ át ^ is the fraction of States |yí d ^) at node t that are in Y^\ 

Equation (2) defines quantum entropy or von Neumann entropy of quantum State p. 
When the target attribute States in are all orthogonal, the definition coincides 
with the classical case. 

Given a partial quantum tree down to node t with the set of class States = 
{| y ), | y 2 * ),..., | yh ^)}, the key problem is what attribute valué we should choose for 
the attributes test. If the expected splitting attribute is at node t, and then, the attribute 
States belonging to set are partitioned into t¡ subsets Df\ , D ■ 1 , ..., , the set 

of class States Y {t) will then be partitioned into Y^ = { Y^ , Y^ , ..., Y^j , ..., Y^}, 
where Y^ = Y {t \ Y- 1 ] contains those target attribute States in Y^ that have attribute 

States belonging to D { p .. The quantum entropy of Y^) is denoted by S(p^]). Then, the 
expected quantum entropy of the system after node t is split using attribute a¡ is 

u 

S e (p¡ t) ) = 'ZpjS(p¡' ) j ) (3) 

7 = 1 

where p?j is the density operator of Y^j , which represents the set of class States of 
the j th expected subnode of attribute ai in node 7, and pj = | Y- r j | / n t is the 
probability of State p?j. The sum of the probability is equal to 1, i.e., X¡/=i = 1- 
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Quantum entropy specifies the mínimum number of qubits of information needed 
to encode the classification of an arbitrary member of Y^\ The smaller the valué 
of quantum entropy is, the fewer number of qubits is required, and then, the smaller 
or simpler the quantum decisión tree is. For each attribute, we calcúlate its expected 
quantum entropy for node t , and then, we get d expected quantum entropies for node 
t : S e (p[^), S e (p 2 ^), • • •, S e (p f } ),..., S e (p^). We choose the attribute whose 
expected quantum entropy S e (p -^) is the mínimum among above valúes of quantum 
entropies as the splitting attribute for node t. 

We modify the algorithm proposed in [21] to find the splitting attribute with the 
mínimum expected quantum entropy. We cali the algorithm finding the splitting node 
Select_Splitting_Node, and the outline of the algorithm is as follows: 

Algorithm Select_Splitting_Node(Tree t) 
choose i uniformly at random from {1,2 ,...,¿/}; 
set S emin = S e (p¡‘ } ) 

repeat 

use Grover’s algorithm to search for j where S e (pJ < S Cmin ; 

if search succeeds then 
set S emin = S e (pfy, 
set i — /; 

else 

return i ; 

end if 

First, the algorithm chooses randomly an Índex i and then calculates the expected 
quantum entropy of attribute i.e., S e (p\ which is set to the initially mínimum 
valué. Second, Grover’s algorithm is used to find a new Índex j such that the expected 
quantum entropy of attribute üj is smaller than the previous mínimum valué S emin . 
If the índex j is found, we update the mínimum valué and the índex i by Sdp^) 
and /, respectively, and then run Grover’s algorithm repeatedly; otherwise, we get the 
splitting attribute a¡ of node t. 

In the algorithm, the expected quantum entropy S e (pj^) of attribute will be cal- 
culated. To obtain the expected quantum entropy, we need to design quantum circuits 
computing the sum of variables and the von Neumann entropy of a quantum State. 
Based on quantum mechanics, we can easily build these quantum circuits. 

2.4 Attributes data partition 

In a quantum decisión tree, each decisión outcome at a node is called a partition, since 
it corresponds to splitting a subset of the training data. The root node splits the full 
training dataset, and each successive decisión splits a proper subset of the data. The 
number of splits at a node is closely related to the type of the attribute and could 
vary throughout the tree. For a discrete attribute, the classical algorithms create a 
descendant of current node for each possible valué of current attribute, and then, the 
training objects are sorted to the appropriate descendant node. 
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Each attribute can be considered as discrete in a quantum decisión tree since its 
valúes are vectors of a Hilbert space spanned by the attribute’s basis States and can be 
regarded as a member of an unordered set. An attribute valué in the tree can be any linear 
combinations of the attribute’s basis States; thus, the dataset of the attribute valué holds 
an infinite cardinality. Infinite set will invalidate the classical discrete attribute method. 
Alternatively, we can extract the distinct valúes from the training objets for an attribute 
and then generate a descendant of current node for each distinct attribute valué. How- 
ever, the prerequisite of the manner is the distribution of the attribute valúes is not too 
sparse; namely, the number of the distinct attribute valúes cannot be too large. Other- 
wise, the generalization of quantum classifier will be worse. An extreme example is that 
the number of distinct attribute valúes is equal to the cardinality of the training dataset. 

A new attributes data partition method is proposed in this paper. Suppose we are 
given the training dataset at node t and the set of attribute States for attribute 
ai e A : üP = {|x^), \xf]), •••* \ x P t )}> where n t is the number of attribute States 
for tree t. We have £> (í) = {(Df } ) r , (D ^) T ,..., (Df) T , (F^) r }. Let D¡ dt) = 
{\x¡ d ^}, \x- d p), ..., | x \ á n át )} denotes the set of distinct attribute States of D^\ where 
ridt is the cardinality of D¡ dt \ then D c D^\ Dp^ c D¡. For any attribute 
State \x- d p) e Dp^ and \x¡ d p) e Dp\ we have \x¡ d p) / |x^) if j ^ k. We 
partition the set into ti subclasses: D ^\, D^, ..., j, ..., D ^ t ,, where c 
flLi D U = D i ] • For any D¡[] c D < 0 and Z?g c D¡'\ Z)g f| = 0 if 
j 7 ^ k. For each | x^j) e we can find one and only one subclass which 
contains | x¿*j), i.e., | x^j) e D^, where c D^\ For each subclass we 
calcúlate its centroid | xcf \) which represents the subclass, and the set of the centroids 
isdescribedby XCP = {|xc-^), \xcf\), ..., |xc-^)}.Inthemeantime, the class States 
set F^ is also split into ti subset Yp = {, Y -^, ..., }, where = F^. 

For each | ) e Y^\ we can find one and only one subset Y^ which contains 

| yj^), i.e., \yoj) e Y^, where Y^ e Y^\ And then, the training dataset D^ is 
also partitioned ti subsets: D^ tcl \ D^ c2 \ ..., D^ c i\ ..., D^ tct í\ and the j th partition 
is describedby = {(D^j ) T , (D^j ) T , ..., (D^j) t , (Ypj) 7 }. 

Before partitioning an attribute, the data distribution pattem of the attribute is dis- 
criminated in advance. Given the attribute ai e A and its attribute States set D { p at 
node t, for an attribute State \xP ), we cali the multiplicity of the State to the cardinality 
of the set multiplicity vatio , denoted by mr p. = m / ti , where m- ^ is the multiplicity 


of in set Dp *. We say that an attribute State State if its multiplicity 

vatio is not smaller than the user-specified minimum multiplicity ratio (called minmv). 
The number of lavge States to the number of distinct attribute States in the set is called 
simple pattevn vatio of the attribute at node t, defined by spvp\ The data pattern 
of an attribute at a node is simple pattevn if its simple pattevn vatio is equal or greater 
than the user specified minimum simple pattern ratio (called minspv)\ otherwise, it is 
called complex pattevn. 
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In order to determine the data distribution of an attribute, we need to look over all 
distinct variables in the training data of the given attribute. To achieve this, a quan¬ 
tum search algorithm needs (1 + \fl + \/3 + • • • + y/ñ) ~ | n 2 times Grover’s 
oracle computing, where n is the number of the valúes. On the other hand, sort- 
ing the States and then counting them is an alternative method which generally 
makes en log n comparisons to sort n Ítems in a classical world, where c is a con- 
stant. For the above reason, we just use a classical sorting algorithm, like Quick- 
sort, sorts the attribute States of a node and then counts the large States from the 
beginning to the end. At last, we determine the data pattern by using simple comput¬ 
ing. The algorithm is called Calculating_Pattern. We omit its description due to its 
triviality. 

For an attribute with simple pattern, we just assign each distinct attribute States into 
an unique subset. When an attribute a¿ is characterized by complex pattern, we then 
group the attribute States into ti clusters so that States in the same cluster are similar 
in some sense, and the dissimilar States are in different clusters. An important task in 
a clustering is to select a similarity measure. Fidelity is a static distance measure of 
the cióse degree between two quantum States. Given two quantum States |jc¿) and | Xj), 
their fidelity is denoted by the following: 

F(p, a) = trjpicrpz (4) 


where p = |v;)(x;| and a — \xj){xj\. The fidelity of two quantum States has the 
properties of the metric property, contractivity, and strong concavity, and the fidelity 
is bounded between 0 and 1, i.e., 0 < F(p, a) < 1. The more similar two States are 
the greater fidelity between them is, in contrast, the more dissimilar two States are 
the smaller fidelity between them is. In this paper, we take the fidelity distance as the 
clustering criterion of quantum States. 

Given n t attribute States set Dp* of attribute a¡ at node t. The clustering process 
consists of two steps. During the first step, all pairs of States with the largest similarity 
are searched; then, we create a cluster for each State of the found States, and mark the 
State as the centroid of each cluster. During the second step, each noncentroid State is 
assigned to its most similar cluster. The description of the algorithm is as follows: 


Algorithm Clustering_Attribute(Tree t, Attribute i) 
if Calculating_Pattern(7, i) = simple pattern then 

put \xfl) e dP to dPj where |jc^) = \x- á p) for each k ; 
else 

set Fm ax = 0; 

for each State e D- dr) do 


use Grover’s algorithm find a State \x¡ á p) where F(pp\ 
if F (pPP , Ppp > F mSLX then 


clear XC¡ 

Set Gnax = 
put \x¡ d p) 


(t ). 


F (Pl?’PíjF 


i s 


Jk ' 

into XC 


(0. 


(dí)x • 

p- jP is máximum; 
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elseif F(pjj \ p¡ d P) = F max then 

put | \x«¡¡) into Xcf } ; 
end if 
end for 

for each State \x- d p) e £>- dí) and \xj- á p) ^ XC-^ do 

use Grover’s algorithm find a centroid|vc-e XC-^ which makes fidelity 
between it and |x} d ^) is máximum; 

put \x¡ á j } ) into the cluster Dp k with centroid \xcp k )\ 

end for 
end if 

In the algorithm of Clustering_Attribute, F mSiX denotes the variable of the largest 
fidelity, p- á p = \Xj á p)(x¡ á p\ and p- d p = \x- d p)(x- d p\ are the density operators of 
the attribute States \x¡ d p) and \x¡ d ^}, respectively, where \x¡ d ^) e D¡ dt \ During the 
process of computing the subroutine Calculating_Pattern, the distinct stateset D {dl) 
is calculated in advance: D¡ dt ^ = {distinct x\x G Dpp. To a simple pattem, every 
class member in the same cluster is the same and the centroid of a cluster is just its 
member. To a complex pattem, we first search the farthest State to each State in set 
, and we then obtain the all pairs of centroid. And then, we assign each State into 
an appropriate group by computing the fidelity between it and each centroid belonging 
to the set XCp\ 

Let DCp ^ = { Dp \, Dp 2 , • • •, T>fj, ..., F>f ],} to be the set of clusters which are 
obtained by assigning attribute States for simple pattern or clustering Ítems of an 
attribute for complex pattern, where DC-^ = Dp\ and Dpj is the j th clusters of the 
attribute at node t of a quantum decisión tree. We then partition the training data 
of attribute at node t into subspaces. The attribute States in a cluster D¡p are all 
equal or similar. The centroid of a cluster Dp) is denoted by | xcp]) which is a real 

State belonging to D^p . For a simple pattern attribute, the centroid of a cluster is the 
member of the cluster. For a complex pattern, the centroid of an attribute subpartition 
Dp] is the State which belongs to the intersection of the subpartition Dp] and the set 

of centroids, Xcj‘\ i.e., | xcf]) = D ( /] n Xc\ t] . 

2.5 Constructing quantum decisión tree 

We now construct the quantum decisión tree t . At first, for each attribute a i , we cluster 
the data into p clusters Dp \, Dp \, ..., Dp ¡, and these clusters have mutually exclusive 
centroids | xcp\), \xcpp), ..., | xcp].) which represent the clusters, respectively. The 

set of the clusters is denoted by Dp ^; we then have d clusters set: Dp ^, Dpp ,..., . 

Second, according to Eq. (3), we calcúlate the expected quantum entropy after t is 
split for each attribute, and then, we choose the attribute with the smallest expected 
quantum entropy as the splitting node. Suppose the splitting attribute is we then 
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split t into ti descendant nodes and label the edges from node t to each descendant 
node \xcf\), \xcf f \),..., \xc^.) 9 respectively. 

The process of choosing a new attribute and dividing the training data is then 
repeated for each intemal child node. In the process, only the attribute States 
associated with the child node is used. This process continúes until any of two 
stopping criteria is met: (1) Every attribute has already been included along 
this path through the tree, or (2) the attribute States associated with the cur- 
rent node all have the same target attribute State (i.e., their quantum entropy is 
zero). 

We then come to a leaf node. A class label is assigned to the node; this is the 
simplest step in tree construction. The process for constructing a quantum decisión 
tree is described as follows: 

Algorithm Quantum_Decision_Tree_Classifier(D, A, C) 

Create a node t ; 

if any of stopping criteria is met then 

mark t as a leafnode; 

set Selecting_Class(£>.7) to t\ 
else 

for each a¡ e A do Clustering_Attribute(í, i); 
set j = Select_Splitting_Node(Y); 

set t.attribute = j; 

for each | xc^)) do 

j i 1 

t C i = Quantum_Decision_Tree_Classifier(Z) (íc/ \ A, C); 
connect t to tci with an edge labeled by | xcj\); 

end for 
end if 
return t ; 

In the algorithm, j means that aj is chosen as a splitting attribute at node t, t C i 
is the i th childtree of t, is the i th subset of training dataset D. According the 

ideal stoping criteria, each training object will be classified perfectly and each leaf 
node corresponds to the lowest quantum entropy impurity. While this is sometimes 
a reasonable strategy, in fact it can lead to difficulties when there is noise in the 
data. In this case, the training data will typically be overfit. Conversely, if splitting 
is stopped too early, then the error on the training data is not sufficiently low and 
henee performance may suffer. We then set a small threshold valué in the reduction in 
quantum entropy, and splitting is stopped if the best candidate split at a node reduces 
the impurity by less than that preset amount. Thus, when a quantum tree down to a 
leaf node, there will be more than one class in the set of target classes, and then, the 
function Selecting_Class chooses a class State with the most number in the set of class 
States at the node t, the input parameter D.Y of the function means the set of class 
States of a training dataset D. 

Besides the quantum entropy impurity threshold, minimum multiplicity ratio , and 
minimum simple pattern ratio are also used in the subfunction of the algorithm Quan- 
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tum_Decision_Tree_Classifier. To select better valúes of these thresholds, we need 
train them in advanee. 


2.6 Running time analysis 

For the sake of simplicity, we just analyze one case of the algorithm constructing a 
quantum decisión tree. We have the following restrictions: 

(1) The tree is a full quantum decisión k-tree with n training data. 

(2) All attribute States for each attribute are mutually exclusive. 

(3) The bound errors of the algorithm are not considered. 

We first analyze the running time of the sub-algorithms Select_Splitting_Node and 
Clustering_ Attribute. 

For the algorithm Select_Splitting_Node, the expected quantum entropy calcula- 
tion is k times. The running time of the algorithm is T\ = c\k\[d because that it needs 
c\\fd queries for finding the most appropriate splitting node, where c\ is a constant. 

For the algorithm Clustering_Attribute, to compute the data pattern, we need to 
sort the training data and then count them in order, so C 2 nlogn -Yc^n times is required, 
where C 2 and C 3 are constant. Let \x- d ^) be the member of D- át ^ at the beginning of 
clustering process, it requires Jñ queries for searching an attribute State \x- á p). For 
the second member, we need +Jn — 1 queries since we dot not need to examine the 

State | v ■' á p ) at this time. Similarly, for the third State, we query *Jn — 2 times, and then, 

1 > j ^ 

we need query (^/ñ + \/n — 1H- Y \¡T) < 2 times for the first step of clustering. 

At the second step, the quantum algorithm takes a time in — k)\fk, where C 4 is a 
constant. The running time is then T 2 = C 2 nlogn + c^n + + c^in — k)y/k. 

Now, we calcúlate the running time of the whole algorithm. At the root node (level 
0) of the tree, for each attribute the query time is T\ + T 2 . Thus for level 0, the running 
time is d{T\ + 72 ). For the full k — tree , each of the k branches contains 1 /k training 
objeets. Accordingly, we need ~^d(T\ + T 2 ) queries in level 1. For the level 2, the 

query times is \d(T\ + T 2 ), and so on. The total running time of the algorithm is 
(T\ -Y T 2 ) for the log^n levels in the tree. 

y/n{y/k— 1) 

3 Searching over decisión tree 

Provided we are given a new object \x new ) = \x\, new ) (8) | x 2 ,new) ® ■ ■ • ® I x d ,new), the 
classification begins the root node t of a quantum decisión tree. Let attribute of the root 
node be we calcúlate fidelities between \xi, new ) and the centroids of ti branches 
of the root node! F(pt^ new , p- j), F(Pi, new , p¿ 2)5 • • • ? F{p¡ new , P¡ j\ • • • > F{pi new , 

where p Unew = \x itnew ){x i:new \ and = |xc?])(xc^]|. Suppose the ;th 
branch with the largest fidelity is selected, we then follow the j th branch to a descendant 
node tej- The second step is to make the decisión at the childnode Fj > which can be 
considered the root of a sub-tree. We continué this way until we reach a leaf node, 
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which has no further test. The new object is then assigned the class of the leaf node 
reached. 

Quantum decisión tree is an ordered list. Theoretically, a tree with arbitrary branch- 
ing factor at different nodes can always be represented by a functionally equivalent 
binary tree. However, multibranch decisión tree is still very popular since the rulers 
from them are more human readability and understandability in many occasions. We 
now describe the algorithm for searching over the decisión tree: 

Algorithm Searching_Quantum_Tree(Tree t , Object \x new )) 
if the root node of t is a leaf node then; 

return the class label of t\ 
else 

set t.attribute to i where ai e A; 

repeat Grover’s algorithm to search j from XC¡' } where F(\x Unew ), |xcg)) is 
the largest; 

return Searching_Quantum_Tree(í C7 , \x new ))\ 
end if 

In the algorithm Searching_Quantum_Tree, t c j denotes the j th subtree of t , 
and X= {\xcf\), \xcfy, ..., | xc^].)} represents the set of centroids of all sub- 
branches of tree t. 

Supposing that the quantum decisión tree is a full k — tree with n training data, 
the tree contains (kn — \)/{k — 1) nodes with a height h = log^n + 1. For algorithm 
Searching_Quantum_Tree, since we use Grover’s algorithm to search the most pos- 
sible branch in each node, thus for each node, we need C5 \[k queries, and then, the 
algorithm would have a running time c^Vklogkn if there is no errors, where C5 is a 
constant. 


4 Conclusión 

In this paper, we have investigated the quantum decisión tree classification, and then, 
we described a new simple learning model which could grow a quantum decisión 
tree used for classifying the quantum objects. To construct a quantum tree, we used 
von Neumann entropy instead of classical shannon entropy as the splitting criterion 
to determine which attribute should be split. We introduced a quantum clustering 
algorithm to discretize the quantum training data. 

The study of machine learning in the quantum world is still at the budding stage. 
Many research topics related to decisión trees were not involved in this paper. These 
open problems inelude the following: (1) how to analyze the error bound of a quantum 
decisión tree; (2) the pruning strategies to stop splitting; (3) how to construct quantum 
decisión trees if some attributes samples are missed; (4) the problem of attributes with 
differing costs; (5) the problem of training data with quantum noise; and (6) how 
to generate the quantum classifier if the training data are composed of mixed States 
instead of puré States. In addition, as a supervised learning method, it is important that 
the performance of an algorithm is demonstrated experimentally. In classical world, 
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the experiments can be carried out conveniently since there are plenty of repositories 
of data are publicly available online. On the contrary, it is still uncultivated land for a 
quantum field. 
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